A Sandhi Splitter for Malayalam

نویسندگان

  • Devadath V. V
  • Litton J. Kurisinkel
  • Dipti Misra Sharma
  • Vasudeva Varma
چکیده

Sandhi splitting is the primary task for computational processing of text in Sanskrit and Dravidian languages. In these languages, words can join together with morpho-phonemic changes at the point of joining. This phenomenon is known as Sandhi. Sandhi splitter splits the string of conjoined words into individual words. Accurate execution of sandhi splitting is crucial for text processing tasks such as POS tagging, topic modelling and document indexing. We have tried different approaches to address the challenges of sandhi splitting in Malayalam, and finally, we have thought of exploiting the phonological changes that take place in the words while joining. This resulted in a hybrid method which statistically identifies the split points and splits using predefined character level linguistic rules. Currently, our system gives an accuracy of 91.1% .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Sandhi Splitter for Agglutinative Languages

Sandhi splitting is a primary and an important step for any natural language processing (NLP) application for languages which have agglutinative morphology. This paper presents a statistical approach to build a sandhi splitter for agglutinative languages. The input to the model is a valid string in the language and the output is a split of that string into meaningful word/s. The approach adopte...

متن کامل

Statistical Sandhi Splitter and its Effect on NLP Applications

This paper revisits the work of (Kuncham et al., 2015) which developed a statistical sandhi splitter (SSS) for agglutinative languages that was tested for Telugu and Malayalam languages. Handling compound words is a major challenge for Natural Language Processing (NLP) applications for agglutinative languages. Hence, in this paper we concentrate on testing the effect of SSS on the NLP applicati...

متن کامل

Significance of an Accurate Sandhi-Splitter in Shallow Parsing of Dravidian Languages

This paper evaluates the challenges involved in shallow parsing of Dravidian languages which are highly agglutinative and morphologically rich. Text processing tasks in these languages are not trivial because multiple words concatenate to form a single string with morpho-phonemic changes at the point of concatenation. This phenomenon known as Sandhi, in turn complicates the individual word iden...

متن کامل

Design of Photonic Crystal Polarization Splitter on InP Substrate

In this article, we suggested a novel design of polarization splitter based on coupler waveguide on InP substrate at 1.55mm wavelength. Photonic crystal structure is consisted of two dimensional (2D) air holes embedded in InP/InGaAsP material with an effective refractive index of 3.2634 which is arranged in a hexagonal lattice. The photonic band gap (PBG) of this structure is determined using t...

متن کامل

Experimental Investigation of the Effect of Splitter Plate Angle on the Under-Scouring of Submarine Pipeline Due to Steady Current and Clear Water Condition

Submarine pipelines are appropriate method for transmission of oil and gas from sea bed. Free spans may occur due to the natural uneven seabed or by under-scouring. Vortex Induced Vibration (VIV) can happen in such free spans at high Reynolds number. Resonance occurs if the frequency of vortex shedding is close to the pipeline’s natural frequency leading to its fatigue that can break the pipeli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014